Recommendation is a wonderful tool to find meaningful content and items online. Systems such as Amazon’s online personalized storefronts, or iTunes recommendation technology can help you deal with the “tyranny of choice” by providing you with a simple list of items that fits your profile.
The list of recommendations that are returned often have little to no information on why the items were recommended. You don’t know if the recommendations were “strong” or if they were just wild guesses. The user is just expected to trust what the system returns.
I wanted to address this problem by exposing the underlying nature of the recommendation system to the user. Other systems use different methods (such as NetFlix’s collaborative filtering approach), but in the case of MyStrands, this system is a complex network of connections between songs. I wanted the user to browse through music based on how the recommender engine considers the correlations between songs. Instead of returning a list of “best bets”, I wanted to graphically show the strengths of the connections between the songs.
There’s a number of problems with this approach. It wasn’t practical for me to calculate the structure of the entire network of music. Instead, I had to come up with a way of “querying” the network to return a smaller, more relevant section. To do this, I treated a playlist as a query for the network. Basically, I extracted all the “neighbor” songs from the network that were directly attached (the neighbor song and playlist song were on one or more playlists together.) The connection weights between the songs are essentially their recommendation strength, so the final extracted network “neighborhood” contains a large amount of quality recommendations for the playlist.
However, this causes a new problem. Popular songs tend to behave like “hubs” that connect to many other songs in the network, so they tend to get selected a lot by this process and end up in a lot of neighborhoods. Most of the time, they’re not that relevant to the playlist you’ve uploaded, but due to their popular nature, they end up on a wide variety of playlists, and therefore become valid recommendations for many different songs according to MyStrand’s recommendation metrics. What I ended up doing was re-weighting the connections of the network neighborhood according to a “participation ratio”. This ratio was a measurement of how many connections they had within the neighborhood vs. how many connections they had outside of the neighborhood. This technique is a lot like the TFIDF technique used for text correlation. Once I had the values for these weights, I could generate a “map” of the network that emphasized the characteristic network structures of the neighborhood. The maps that are generated often look quite a bit different than most network maps. I’m going to write more about the different shapes and structures that appear in these maps and what they indicate, but in the meantime, you can play around with this “mapping tool” yourself on the MyStrands labs site. Let me know what you think!