A "Magnum Opus" of Music Visualization
Paul and I recently finished up our tutorial on Using Visualizations for Music Discovery. The fast paced, 3 hour long presentation covered nearly every instance of music corpus visualization that we had found in the literature, or in casual “fan” based renditions. This included visualizations of artist connections generated from millions of playlists by high powered supercomputers, all the way down to personal hand-drawn sketches on the back of a notebook.
I’ve embedded the slides from the tutorial above. However, the format of the online slides do not include any of the video segments that we included, nor does it include the demo visualizations that we generated live on our laptops. While our demo code (as well as the rest of the tutorial materials) is available online, I also prepared some back up video versions of my demos, just in case something went horribly wrong during the presentation. I thought it would be a good idea to post these videos online for posterity.
The demo focused on exploring the “acoustic space” of music involving the lute and the folk guitar. Both instruments can be similar acoustically, but generally take part in culturally distinct genres. In the realm of audio-based genre classification, renaissance music that features the lute can get confused for folk music songs that feature the guitar. I wanted to focus on representative lute and folk music for the demo, and see if visualization helps to show the confusion that a classification might run into, and perhaps lead to ways of resolving it.
Rather than show all of the acoustic features at once, I chose to look at them one at a time. The first feature that I wanted to focus on was pitch information, or a general representation of what pitches, keys, etc. were present in the music. For this and all demo videos, the blue nodes corresponds to lute music, while the red nodes correspond to folk music. However, there are some “mislabeled” songs, as we will discover later:
The visualization shows how adjacent songs do in fact have similar pitched melodies, but can have very different styles or genre. This makes sense, since most popular genres use the same western arrangement of major and minor chords, etc. So, pitch information is essentially useless to separate the lute and folk music in our data.
The next acoustic data I looked at was timbre, which is a representation of the song’s tone or color. This type of data is typically the “best” information to use for genre classification, and I was curious to see how the timbre separated these genres:
Interestingly enough, the timbre space “almost” separates the music clearly. The softer tones of certain lute music are distinct from the rest of the folk music, which has harsher or brassier coloring. However, the lute has a variety of timbres that it is able to create, and once the lutist uses this harsher plucking action, it sounds more like conventional folk guitar music. At this point, it may be possible to make a “pretty good” separation using a hyperplane, but as indicated by the visualization, there would most likely be some significant error using timbre alone.
Luckily, our database of music included non-acoustic features, such as tags. The tags were simple terms or phrases that were applied to songs by human listeners. Using appropriate text retrieval techniques allows us to see how the music was separated according to how it was tagged:
Here we see the two styles of music are clearly separated. However, there is a problem… I had originally used the tag data to identify/label lute and folk music in the first place. I quickly found out that some of this music was mislabeled… i.e. classical lute music was labeled as folk, etc. So, even though the separation was clean, we now have exposed a new form of error.
The final approach that I tried was to mix the term and the timbre data in a “hybrid” representation:
This had the best result of any of the approaches that I had tried. It is no longer possible to separate the music by color (some of the red folk music is included among the blue lute cluster), I show that these red dots were actually mislabeled. In this fashion, the combined term + timbre features were able to “correct” each other, and a more valid representation of the genres were presented.
Also, if you’ve made it this far, you should check out (the co-presenter) Paul’s blog. He’ll be live-blogging the rest of the conference as it continues through the week.