We derive an efficient method to perform clustering of nodes in Gaussian graphical models directly from sample data. Nodes are clustered based on the similarity of their network neighborhoods, with edge weights defined by partial correlations. In the limited-data scenario, where the covariance matrix would be rank-deficient, we are able to make use of matrix factors, and never need to estimate the actual covariance or precision matrix. We demonstrate the method on functional MRI data from the Human Connectome Project. A matlab implementation of the algorithm is provided.


For Matlab code, read ‘more’

A = randn(500,100000); % data matrix
lambda = 1; % regularization parameter
k = 100; % number of clusters

[rows_A,cols_A] = size(A);

% standardize data columns
A = bsxfun(@minus,A,mean(A));
A = bsxfun(@times,A,1./sum(A.^2).^.5);

% compute diagonal of R via sum of squared eigenvectors
[uA,sA,vA] = svd(A,'econ');
r = sum(vA(:,1:rank(A)).^2,2)';

% compute pseudoinverse efficiently
iA_lambda = A'*inv(A*A'-lambda*eye(rows_A));

% compute scaling vectors (symmetric version)
s = 1./(1-r(:));
z = abs(s(:)).^.5;
zeta = sign(s).*abs(s(:)).^.5;

Az = bsxfun(@times,A,z(:)');

% randomly assign columns to clusters initially
c = ceil(rand(cols_A,1)*k);

n_change = inf
while (n_change>0)
M = sparse(1:cols_A,c,1,cols_A,k,cols_A); % cols of M are masks of clusters
M = bsxfun(@times, M, 1./sum(M,1)); % now M is averaging operator

P_c_1 = iA_lambda*(Az*M); % first part of cluster center calc
P_c_2 = bsxfun(@times,M,r.*zeta); % second park (peak removal)
P_c = bsxfun(@times,A_c_1-A_c_2,z(:)); % cluster centers

Pz2_c = sum(P_c.^2,1); % squared term from distance

Cz = bsxfun(@times,P_c,z(:)); % weighted cluster centers
D_ct1 = (Cz'*iA_lambda)*Az; % first part of cross-term
D_ct2 = bsxfun(@times,Cz',r'.*zeta(:)'); % second part of cross term
D_ct = D_ct1-D_ct2; % cross-term

Dz = bsxfun(@minus,D_ct,.5*Pz2_c'); % dist metric (sans unnecessary term)

c_old = c;
[D_max,c(:)] = max(Dz,[],1); % c is arg of max

n_change = sum(c~=c_old);
Clustering Gaussian Graphical Models