A random walk from eigenvectors to parallel page rank

Power law algorithms are ideal for computing things like page rank. That isn't obvious, however.

The basic outline idea is that hub and authority style algorithms are intimately related to eigenvector or singular value decompositions (depending on whether the links are symmetrical). This also means that there is a close relationship to asymptotic beahavior of random walks on the graph. That probably still isn't obvious, so let's dig in.

If you represent the linkage in the web by a matrix that has columns representing source page and rows representing the target page and with a 1 where-ever the source page has a link pointing to the target page, then if you start with a vector with a single non-zero element equal to 1 as a representation of a single page, then multiplying by the linkage matrix will give you a vector with 1 in the positions corresponding to the pages the original page linked to. If you multiply again, you get all the pages that you can get to in two steps from the original page.

Mathematically, if we call the original vector x and the linkage matrix A, the pages that x links to are just Ax. The pages that are two steps from x are A(Ax) = A² x.

The eigenvector decomposition of A is just a way of writing A as a product of three matrices:

A = U S U'

U' is the transpose of U, and U has the special property that U'U = I (it is called ortho-normal because of this).

S is a diagonal matrix.

There is lots of deep mathematical machinery and beautiful symmetry available here, but for now we can just take this as given.

The set of pages n steps from x are

x_n = Aⁿ x = (U S U')ⁿ x = (U S U')^n-2 (U S U') (U S U') x
= (U S U')^n-2 (U S (U'U) S U') x = (U S U')^n-2 (U S² U') x
= U Sⁿ U' x

This is really cool because Sⁿ can be computed by just taking each diagonal element and raising it to a power.

Eigenvector decompositions have other, really deep connections. For instance, if you take the elements of S (call the i-th one s_i) then

$\displaystyle\sum$ _i s_iⁿ

is the number of paths that are n steps long.

Connected (or nearly connected) clusters of pages can also be derived from the eigenvector decomposition. This is the basis of so-called spectral clustering. For some very impressive examples of spectral clustering see this paper.

So eigenvectors are cool. But how can we compute them? And how can we compute them on BIG graphs in parallel?

First, note that if Aⁿ = U Sⁿ U' and if some of the s_i are bigger than others, the big ones will quickly dominate the others. That is pretty quickly, Aⁿ $\approx$ u₁ s₁ⁿ u₁'. This means that we can compute an approximation of u₁ by just doing Aⁿ x where x is some random vector. Moreover, we can compute u₂ by starting with a different random vector and iterating the same way, but with an additional step where we forbid the result from going towards u₁. With just a few additional wrinkles, this gives us what is called the Lanczos algorithm. Golub and van Loan's excellent book Matrix Computations gives a lot of information on these algorithms.

The cool thing here is that our random vector can represent a single page and we can approximate the final result by following links. Following links is just a (human-readable) way of saying sparse matrix multiplication. If we do this multiplication against lots of different random starting points, we can quickly build parallel algorithms to compute things like page rank.

Surprise and Coincidence - musings from the long tail

Tuesday, April 15, 2008