{"id":139,"date":"2019-10-05T13:28:39","date_gmt":"2019-10-05T13:28:39","guid":{"rendered":"http:\/\/www.keithdillon.com\/?p=139"},"modified":"2023-05-15T03:51:53","modified_gmt":"2023-05-15T03:51:53","slug":"clustering-gaussian-graphical-models","status":"publish","type":"post","link":"https:\/\/www.keithdillon.com\/index.php\/2019\/10\/05\/clustering-gaussian-graphical-models\/","title":{"rendered":"Clustering Gaussian Graphical Models"},"content":{"rendered":"\r\n<p>We derive an efficient method to perform clustering of nodes in Gaussian graphical models directly from sample data. Nodes are clustered based on the similarity of their network neighborhoods, with edge weights defined by partial correlations. In the limited-data scenario, where the covariance matrix would be rank-deficient, we are able to make use of matrix factors, and never need to estimate the actual covariance or precision matrix. We demonstrate the method on functional MRI data from the Human Connectome Project. A matlab implementation of the algorithm is provided.<\/p>\r\n\r\n\r\n\r\n<p><a href=\"https:\/\/arxiv.org\/pdf\/1910.02342\">https:\/\/arxiv.org\/pdf\/1910.02342<\/a><\/p>\r\n<p>For Matlab code, read &#8216;more&#8217;<\/p>\r\n<p><!--more--><\/p>\r\n<pre>A = randn(500,100000); % data matrix<br \/>lambda = 1; % regularization parameter<br \/>k = 100; % number of clusters<br \/><br \/>[rows_A,cols_A] = size(A);<br \/><br \/>% standardize data columns<br \/>A = bsxfun(@minus,A,mean(A));<br \/>A = bsxfun(@times,A,1.\/sum(A.^2).^.5);<br \/><br \/>% compute diagonal of R via sum of squared eigenvectors<br \/>[uA,sA,vA] = svd(A,'econ'); <br \/>r = sum(vA(:,1:rank(A)).^2,2)';<br \/><br \/>% compute pseudoinverse efficiently<br \/>iA_lambda = A'*inv(A*A'-lambda*eye(rows_A));<br \/><br \/>% compute scaling vectors (symmetric version)<br \/>s = 1.\/(1-r(:));<br \/>z = abs(s(:)).^.5;<br \/>zeta = sign(s).*abs(s(:)).^.5;<br \/><br \/>Az = bsxfun(@times,A,z(:)');<br \/><br \/>% randomly assign columns to clusters initially<br \/>c = ceil(rand(cols_A,1)*k);<br \/><br \/>n_change = inf<br \/>while (n_change&gt;0)<br \/>  M = sparse(1:cols_A,c,1,cols_A,k,cols_A); % cols of M are masks of clusters<br \/>  M = bsxfun(@times, M, 1.\/sum(M,1)); % now M is averaging operator<br \/><br \/>  P_c_1 = iA_lambda*(Az*M); % first part of cluster center calc<br \/>  P_c_2 = bsxfun(@times,M,r.*zeta); % second park (peak removal)<br \/>  P_c = bsxfun(@times,A_c_1-A_c_2,z(:)); % cluster centers<br \/><br \/>  Pz2_c = sum(P_c.^2,1); % squared term from distance<br \/><br \/>  Cz = bsxfun(@times,P_c,z(:)); % weighted cluster centers<br \/>  D_ct1 = (Cz'*iA_lambda)*Az; % first part of cross-term<br \/>  D_ct2 = bsxfun(@times,Cz',r'.*zeta(:)'); % second part of cross term<br \/>  D_ct = D_ct1-D_ct2; % cross-term<br \/><br \/>  Dz = bsxfun(@minus,D_ct,.5*Pz2_c'); % dist metric (sans unnecessary term) <br \/><br \/>  c_old = c;<br \/>  [D_max,c(:)] = max(Dz,[],1); % c is arg of max<br \/><br \/>  n_change = sum(c~=c_old);<br \/>  disp(n_change);<br \/>end;<\/pre>\r\n","protected":false},"excerpt":{"rendered":"<p>We derive an efficient method to perform clustering of nodes in Gaussian graphical models directly from sample data. Nodes are clustered based on the similarity of their network neighborhoods, with edge weights defined by partial correlations. In the limited-data scenario,<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"_links":{"self":[{"href":"https:\/\/www.keithdillon.com\/index.php\/wp-json\/wp\/v2\/posts\/139"}],"collection":[{"href":"https:\/\/www.keithdillon.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.keithdillon.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.keithdillon.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.keithdillon.com\/index.php\/wp-json\/wp\/v2\/comments?post=139"}],"version-history":[{"count":6,"href":"https:\/\/www.keithdillon.com\/index.php\/wp-json\/wp\/v2\/posts\/139\/revisions"}],"predecessor-version":[{"id":324,"href":"https:\/\/www.keithdillon.com\/index.php\/wp-json\/wp\/v2\/posts\/139\/revisions\/324"}],"wp:attachment":[{"href":"https:\/\/www.keithdillon.com\/index.php\/wp-json\/wp\/v2\/media?parent=139"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.keithdillon.com\/index.php\/wp-json\/wp\/v2\/categories?post=139"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.keithdillon.com\/index.php\/wp-json\/wp\/v2\/tags?post=139"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}