Some day, we'll all speak brain
As a PhD student, you keep wondering if you’re really progressing towards a general theme that you can work into a thesis, or if you’re just dashing around like a dog chasing shiny hubcaps. Whichever model I’m following, I’m realizing that I can’t really help it.
Academic pathologies aside, I had another shiny hubcap go by, and I’m off again. The topic du jour is efficient encoding of signals, such as sound, video, etc. The paper I’m interested in was written by two Carnegie Mellon researchers named Lewicki and Smith entitled “Efficient Audio Encoding”. They have a lab website here, and the paper is here. Unfortunately, you’ll have to have a subscription to Nature to view the paper.
The basis of the paper is “spike encoding” of signals. The brain uses neuron pulses, or spikes, to encode information about a signal, such as audio data. These spikes are fairly sparse, and are very efficient at describing the signal with a minimal amount of activity. In fact, Smith and Lewicki are arguing that the spiking is functionally optimal for speech.
This brings up the question “Does the brain adapt to speech, or does speech adapt to the brain”. Smith and Lewicki would argue for the latter (although the former does occur, such as a higher level of sensitization to one’s own dialect). The experiments done by Smith and Lewicki showed that the brain encodes speech at a near optimum level given its neural coding characteristics. The brain in question, however, belonged to a cat, and not a human. This seems to indicate that we use our specific speech patterns simply because our brains are efficient at handling them.
Smith and Lewicki suggest in another paper that music seems to be suited for this encoding scheme as well. I may be going out on a limb, but If these claims are true, it could revolutionize the way we understand music and speech. First off, there will be the obvious efficiency gains. Cell phone technology wouldn’t have to limit the sound quality to scrimp and save on bandwidth. Movies and music could be stored with even less space (although, space is no longer at the premium it used to be). The main problem with the method so far is that the encoding/decoding method can be fairly hard for a computer to perform (mainly because it does not have the massively parallel computation structure the brain has).
Efficiency aside, I think this is only scratching the surface of what this method is capable of. If we’re characterizing signals in terms of an optimally efficient “brain language” (neural spike activity), what can this brain language tell us about the signal it’s describing? It’s already helped us understand why we’ve chosen certain sounds for our speech, perhaps it can tell us why we’re drawn to certain patterns of music, or come to an understanding of correlations between music and phrases in terms of their spike encodings.
The main point here is that this method is a significant step towards getting computers to use the same kind of signal processing routines that brains use. If we’re concerned about true “human computer interaction”, this kind of discovery is extremely important because it potentially removes a “layer of obfuscation” between a computer and a human being. Exciting stuff!