I’ve been at the NetSci conference in my backyard for the last week or so. It’s an ambitious effort to bring together some of the leading researchers in network science across many different fields and areas of application.
One thing has been made very clear to me. While there are some striking similarities between networks in many different contexts (see Barabasi’s excellent “Linked” for more on this), there is also a fundamental difference in the nature of networks, and the relevance of the measurements that you can extract from them.
There seems to be two very different network types that are considered at this conference, with the first type having “essential” links, with the others having “bayesian” links. The former type of network can essentially “rely” on the link characteristics of the network, usually because a large amount of effort has been given towards verifying and “canonizing” the network structure.
However, there are also other types of networks where the nodes and the existence of links between them cannot be completely established, through various reasons of time, size, etc. These types of networks can only be sure of a bayesian probability of a connection between their nodes, if in fact they can be sure that the nodes exist at all! (Bayesian networks already are defined differently than how I’ve described them here, but the real Bayesian networks are actually very highly related to my ZMDS method.)
Examples of this type of network include content networks, describing connections between media extracted from human action (aggregating words into documents, songs into playlists, etc). The problem is that humans can make errors, and can’t be relied on to cover every single conceivable “node” or “connection” that would fit in a requisite network.
Therefore, a lot of the measurements that rely on “essential” qualities of networks must be taken with a grain of salt. Such measurements include betweeness, clustering coefficient, diameter, etc. The main argument here (for bayesian networks as I’ve described them) is that a link WILL exist between any two nodes given a network sample of large enough size (in my chosen music domain). Why? Because it is impossible to completely remove elements of human error that form these links, and it’s also conceivable that in some person’s mind, a valid link does exist between the two nodes.
The only thing that these networks have in their advantage is the weighting that exists on the edges between the nodes. Network measurements that make more sense for these kinds of networks include “flow” measurements and the like.
So… I’ve been a little bit disappointed. I haven’t found any “silver bullets” here that are going to change the way I think about networks. However, it’s been incredibly instructive and informative, and perhaps the most promising leads relate to relationships between networks, such as the relationships between music and the people that listen to them.
*update: OK, there was an interesting talk on “local betweeness centrality” by some researchers from Penn State. This work is related to work by Kleinberg, et. al., and builds off the notion of edge weights. I think it’s interesting to note the indexing problems that one runs into here, particularly for hub nodes It seems like for practical purposes, it would be useful to build eigenvector indices of neighbors for hub nodes, rather than reading in a huge list of neighbors into memory. This would allow for a certain degree of “graceful degradation” with respect to the structural indexing of the hub nodes… i.e. you could return a sampling of neighbor nodes based on their structural characteristics with respect to the hub node, rather than their global degree or other metric. This would ensure (at least in the case of search) that you were following promising paths that would lead to structurally distinct parts of the network.