Skip to content
April 25, 2009

Cognitive Phase Points in Social Network Sizes

Social networking has been a huge buzzword for the last few years.   Companies have sprung up around helping people keeping in touch with friends, family, colleagues, and potential colleagues in a semi-public, semi-permanent format.  In typical “buzzword” fashion, social networking has been mercilessly exploited and shoehorned into dozens of contrived contexts.  Whether it’s Walmart trying to set up a social network for teens… who perhaps didn’t want to associate with the Walmart aesthetic, or iYomu, which tried to cater to an older demographic… typically the slowest population to adapt to new technology, particularly new communication technology.

In addition to assuming that social networks can be applied to nearly any business endeavor, there was also the sense that a greater utilization of these networks would somehow improve some sort of social awareness, and by corollary, provide the participating users with greater ‘access’ to individuals in otherwise inaccessible or distant locations.

To a certain degree, this is true.  People can keep tabs on distant friends, families, and colleagues in very informal fashion using facebook and twitter, but there has to be limits to how many meaningful social connections an individual can maintain, at a minimum level of awareness…

The general sense seems to be that “more friends=better”, and entire careers seem to have been built on exploiting social networks for self promotion.  However, turning your personal social network into a competitive  ‘numbers game’ seems… well… wrong.  It turns out there are pretty well defined ‘averages’ for how many meaningful relationships an individual can maintain.  Basically, there are three thresholds for social connections.

  1. 5 individuals = A number influenced by our working short term memory.  This seems to be the maximum level of individuals that you can become ‘intimately familiar’ with.  You might be able to predict how this individual would respond to given situations, or finish their sentences for them.
  2. 15 individuals = The maximum number of individuals you can form a “deep trust” with.  These are individuals that you feel completely relaxed and unguarded around.
  3. 150 individuals = The maximum number of individuals you can “enumerate” and identify with in a meaningful fashion.

The 150 number is a far cry from the thousands of friends on facebook/myspace that many people seem driven to acquire.  In fact, the very act of ‘friending’ someone on facebook may have undermined the  meaning of the word ‘friend’.  Perhaps all that facebook/myspace needs is clearer semantics behind the connections its users form, but it does feel to me at times that the notion of and online ‘friend’ has become garbled… and it seems that limiting the amount of connections you can make might be a start towards making them more useful.

Designing for social networks using these real-world social/cognitive limits might lead to some different interaction arrangements.  I had thought of the following:

  • For my “select 5″, I would give them the opportunity to directly connect/interrupt me at almost any time.  Richer communication channels (video/voice) would be enabled by default.
  • For my “trusted 15″, I would grant the opportunity to directly respond on any blog postings/etc without explicit editorial permission.  I would publish personal status messages and inside jokes through a limited public interface.
  • For my “group of 150″, I would grant permission to see pictures of me and my other friends (I think face recognition technolgy would be great to automatically tag and filter photos so that I wouldn’t have to).  I would publish less personal/professional messages that they would be able to see.
  • For any individuals above that number, they would be able to see non-personally-identifiable pictures, blog posts, and professional messages.

In general, I think as social networks mature…’less’ will be/mean more.

April 6, 2009

The Dissertation Marathon

I’m getting close (hopefully) to submitting my dissertation proposal (on music corpus visualization, natch).  This has been something that’s been a goal of mine for a long time, and I feel like my enthusiasm alone has carried me up to this point.  So, maybe I feel lucky that only recently have I started to feel very run down and disenchanted with the process of writing/editing/reviewing for 4-6 hours everyday.

Anyways, I’ve started doing some things that I think help greatly, and since many of my friends/colleagues are also going through this same process, and mentioned the same problems, I thought I could share some of what I’ve started doing that seems to help.  I’ll give them here in order of importance:

  1. Mood:  It’s really easy to get into weird mood swings, particularly if you’re like me, and are no longer on campus, and don’t have a bunch of others around to keep you in check.  If you have serious emotional problems, then this advice is not for you.  However, I think most people will go through some sort of strange mood while writing, it’s just part of the process.  What has helped me a lot in this area has been a simple amino acid called L-Methionine.  This is a biochemical precursor to Sam-e, a more expensive natural hormone related to joint strength and mood control.  I take L-Methionine, and have noticed a marked improvement in mood with no adverse side effects.  I realize of course that all of this is unproven according to the FDA, and some people consider it useless.  However, it is perfectly safe.  YMMV.
  2. Sleep: I’ve always had problems sleeping, and it gets worse when I travel.  Some of my collegaues suggested Melatonin, which they use to get over jet lag.  I have to give it my endorsement as well.  I’ve tried prescription strength sleep aids, but this is far better.  No side-effects, non-habit forming, and it doesn’t even make me dangerously drowsy.  I just feel ready to go to sleep, and sleep better than I do without it.
  3. Exercise:  It’s remarkable to me how exercise can change my mental state.  It only takes about 30 min. of jogging, etc. before I have a new perspective on things.  It’s also my “canary in the coal mine” for my weekly routine.  If I’m not having a good workout, it usually means I’m not getting enough sleep, or not eating right, etc.  Then it’s time for a change.
  4. Diet:  Some people are able to control all of these factors with diet and exercise.  More power to them.  I like more of  a variety, so I tend to eat whatever I want (within reason).  I’ve got nothing really profound here:  the wrong kinds of food can throw everything off kilter, so it’s good to be aware of how they affect you.  Generally, I eat first, and ask questions later.  It doesn’t get me into too much trouble.

Anyways, these are just tips that seem to work for me, and aren’t supposed to be a guideline.  I also don’t consider myself one of the “all-natural” crowd, nor do I endorse any sort of chemical stimulant/barbituate (everything I’ve referenced here has weaker side effects than caffeine/tylenol pm).  Writing the dissertation is shaping up to be a long, drawn out affair, and I’m very aware of the percentage of people who end up being A.B.D, which may be as high as 50% in some areas.  I just am going to use every tool I have available, and wanted to share them as well.

March 30, 2009

Tutorial: Music Visualization for Discovery

visualizing music tutorial

Paul Lamere and I recently got our tutorial accepted to ISMIR 2009.  It is titled Using Music Visualizations for Discovery and will be a survey of all the different ways that researchers have approached visualizing music collections, and how we can use these methods to find new and interesting music.

At first glance, this perhaps appears to be a bit of a ‘fringe’ topic.  Visualizations are often seen as more of an art form than a straightforward research tool.  However, I believe that music is a special form of information.  On one hand it behaves according to a set of physical rules, following certain spectral behaviors as an acoustic signal.  On the other hand, it behaves as a cultural icon, following another set of rules in a network of contextual associations and systems of meaning.  On yet another hand, it is understood in a musicological system of compositional symbols and structures.  To study and understand only one aspect of music is to have an incomplete grasp of the role it plays in the other systems.  You could spend an incredible amount of time understanding how musical information is related in each context.  Perhaps this is why music information retrieval projects spanning different music metadata types are somewhat rare.

Paul and I are believers that visualizations can be effective “big picture” overviews of music corpus information in many different contexts, and can be a great first step towards understanding how each form of music metadata reflects an important way of understanding music.  We hope to use our combined expertise to produce a useful overview of the current state of the art in music corpus visualization, as well as provide some useful open source tools for researchers to investigate their own musical information visually.

We’ve been gathering a lot of existing relevant projects already, and have tagged them all on del.icio.us.  There’s a lot of great projects on there, and it’s worth a look.  If you think we’ve missed something, tag the respective website with “MusicVizIsmir2009″ and we’ll check it out and add it to our collection.

Paul has some information available on his blog…. but just don’t believe everything he says about me, I’m the understudy in this group :)

March 2, 2009

Misconceptions of dividend producing stock

(Self Disclosure:  My father is an investment manager whose portfolios and strategies include dividend producing companies.)

This is way off topic for me, but I’ve been really bothered by all of the anti-wealth sentiment that seems to be growing in the US… that somehow people with a lot of money are the cause of the economic collapse, or that they are hurting the economy by continuing to take their pre-collapse salaries.  This sentiment is routinely expressed (of course) towards the CEO’s of big firms, but also now for college basketball coaches.

Apparently, limiting the salaries of corporate/organization heads is not enough, and so right now there is growing sentiment that dividends should be taxed higher, since many heads of companies invest in these stocks to generate income, and by doing so avoid higher tax rates on salaries.

The article goes on to assert that the investment in dividend producing stocks led to greater speculation, which in turn led to the current market state.

One example of “speculative dividend investment gone awry” are the Madoff funds, where individuals and companies bought into his Ponzi scheme, and ended up paying taxes for dividends they never received. However, this had nothing to do with the real culprit of the crash, the mortgage-backed security.  The folks that invested with Madoff were duped and robbed.  They have nobody to blame but themselves for getting hoodwinked by one of the oldest investment cons in the book.   It doesn’t mean the underlying dividend system is flawed, and it doesn’t mean the dividend system contributed to the current market situation.

The reason why dividend income is taxed lower is that it has already been taxed.  The companies that offer dividends have already paid taxes on their earnings.  Companies (not funds) that offer dividends typically are older, more stable companies that have limited opportunity for growth.  By offering dividends, they offer an incentive for investment over another faster growing, but less established company.

The Christian Science Monitor article is wrong on several points.  Investing in dividend producing companies is almost always less speculative than non-dividend producing companies.  You are typically investing in companies that are producing positive cash flow, and in companies that have been around for a while.  This is far safer than investing in other types of companies/funds.  This method of investment is also not the sole domain of the super-rich.  Anyone can do it, and it remains a powerful tool for people looking to make safer investments in the market.

The government has meddled with investment strategies at its own peril.  They encouraged us to buy homes we couldn’t afford, and directly brought about arrangements for loan agencies to buy and sell bad debt. The last thing they should be doing now is discouraging us from investing in strong, stable companies.

It is my belief that we are going to get out of this bad market through “back to basics” economic policies and investment strategies, not by eating the rich.

February 6, 2009

Iterable <Iterable<T>> in haXe

One of the (few) limitations of haXe is that it does not structurally type collections recursively, either through the type itself, or a type parameter.  So, writing a function header of :

public function (it:Iterable<Iterable<T>>) : …{}

will not work in the intended way.  In fact, trying to pass a List<List<T>> or [[1,2,3],[4,5,6]] for “it” will cause errors.

This is a limitation for functional programming routines, especially functions that handle collections of collections like chain, zip, etc.

In order to handle these situations, it’s necessary to come up with a work-around.  For relevant methods, I have a function header that looks like:

function(it:Iterable<Dynamic> , ?nonIterableBehavior<Dynamic>->Iterator<Dynamic>):…

The “it” parameter is a simple single collection of any Type (Dynamic).  This will accept Iterable<Iterable<T>>, but also Iterable<T>, or a mixed Iterable of both iterable/non-iterable elements.  I’m mainly interested in the first case, but I’ll have to handle the latter two cases as well.

In order to handle the all of the cases, it’s necessary to detect if an individual element T is actually an Iterable<T2> (T2 equals some other type).  The straightforward method would be to use Reflect.hasField(“field”, iterator).  However, this will not detect the iterator() in Arrays.  Also, since we’re using a Dynamic type, we’ll have to test for null.  In the end, this should work for an Iterable tester:

public static function isIterable(d:Dynamic):Bool{
return (d != null && (Reflect.hasField(d,’iterator’) || Std.is(d, Array)));
}

As long as the object has an “iterator” field function that behaves in a consistent way, this will work for it.

Now that we can test for Iterables, we can handle non-iterables.  In relevant functions I use a function called nonIterableBehavior that transforms these non-iterables into Iterators, and then pass that resulting iterator along to chain as if it were a proper Iterator.  In my own classes, nonIterableBehavior defaults to something I thought would be appropriate.  For instance, in chain(), the default nonIterable function applies IterTools.repeat(“x”,1) to the non-iterable element.  In zip(), the default nonIterable function instead skips any nonIterable elements  (skipping is assumed by a function that returns null).

So… the net effect is that I can now call something like

ListTools.chain([[1,2,3,4],5,[6,7,8]])

and get back a List<Dynamic>:

{1,2,3,4,5,6,7,8}

From here, if I need to do anything further with the elements in the list, I’ll have to type check everything with Reflect/Std.is methods like Std.is(x,Float), etc.  However, in most of my cases, just printing out a list is useful.

December 14, 2008

Signals and Slots

I spent some time this weekend implementing a signals and slots class for haXe.   This method of event handling is often useful for managing GUI’s, since GUI control code is often defined in separate classes from GUI display code, and yet they’re very similar in terms of input/output.  Provisions/expectations of the controls and views (resp.) are often identical in terms of their parameters (Mouse moves to position X:Y…. Cursor Image moves to position X:Y, etc.).  Rather than falling back on boilerplate Event code, signals and slots let’s you stitch together relevant function calls externally from the Class(es) that define them.

The drawbacks that I’ve run into (in haXe) are that I’ve had to rely heavily on Dynamic types, which obviously can’t be checked at compile time.  Furthermore, memory leaks are problematic if you’re not careful to remove all the signal handlers.  However, I think this still would be useful in certain situations.

Check it out on google code!

October 30, 2008

Lambda on the Web. Or, how I learned to stop worrying and love linked lists.

Arrays and linked lists are the two classic datatypes for storing “collections of things”.  Arrays typically are preferred in conventional programming tasks.  They take up less memory, have better data localization, and are faster for look-ups, provided you know the index of what you’re looking for.  However, I’ve recently come to the opinion that a lot of common web-oriented tasks just are not well-suited for arrays.  I’m not saying arrays aren’t useful, but that lambda type operations on linked-list data structures are starting to make a lot of sense to me.

One reason that linked-lists are more useful is that web developers are usually not coding with heavily optimized, low-level languages like C.  Many developers will code for virtual-machine platforms that do not allocate and handle arrays optimally for speed or efficiency.  Adobe’s AVM2 is a good example of this, and several people have noticed that linked-lists can outperform arrays for a lot of relevant tasks.  While there are workarounds for working efficiently in arrays, these can introduce a lot of cognitive overhead, especially when working with complex objects (such as XML/JSON data types).

On the server side, linked-lists are useful because it’s not always possible to store a simple index of items (people, pages, clicks, etc.) on one machine.  This is one reason that Google has started to work more with Lambda type operations such as MapReduce, and to spread the calculation of PageRank scores over multiple machines in a flexible and more fail-safe environment.  Lambda and linked list data types go together quite well because Lambda methods are normally intended to be “stateless.”  For each step, a typical Lambda method usually only concerns itself with an individual element, a function, and some output element.  For this reason, they can work on successive individual elements of a linked list without having to keep the whole list or index in memory.  The drawback to linked lists are that you don’t have random access to any of the elements (you have to go through them in order), and you incur a significant amount of memory overhead for each element (each element has to contain a pointer to the next element, which essentially is a memory location… i.e. for a 32 bit machine, it’ll take at least 4 bytes).  So, if you’re only storing (common) Integers in a linked-list, then the space required to store the pointers is going to be equal to the data itself.  However, keeping these limitations in mind, they still can be very useful in appropriate contexts.

One of the reasons that I’ve been interested in haXe, is that it includes a Lambda class as part of its API.  This fact, combined with its concise type-checking facilities, means that you can create very powerful Lambda type operations, and implement them with strong type checking.  This is great for performance and debugging, and haXe’s target options for client side and server side could make this type of programming extremely useful.

So to summarize, array’s performance benefits can be lost when dealing with client/server side virtual machines, or in large scale server side information processing.  I think it’s worth checking into some functional programming techniques to improve performance or scalability for some common web site tasks, and I think haXe is a wonderful place to start.

To this end, I’ve started to put together a library that extends the basic functionality of the haXe Lambda class, and offered it as part of an open source haXe library called “sugar-hx“.  The library includes functions taken from different Lambda method implementations in other languages.  It currently includes scan, unfold, zip, unique, bifurcate, as well as several “grouping” functions.  If you use it, or have suggestions, let me know!

Follow

Get every new post delivered to your Inbox.