Dec 14,2011: Upload the example of test data and commands. Please check Download.
Analysis of large networks is a critical component of many of today's application environments, including online social networks, protein interactions in biological networks, and Internet traffic analysis. The arrival of massive network graphs with hundreds of millions of nodes, e.g. social graphs, presents a unique challenge to graph analysis applications. Most of these applications rely on computing distances between node pairs, which for large graphs can take minutes to compute using traditional algorithms such as breadth-first-search (BFS).
In this project, we study ways to enable scalable graph processing for today's massive networks. We explore the design space of graph coordinate systems, a new approach that accurately approximates node distances in constant time by embedding graphs into coordinate spaces. We show that a hyperbolic embedding produces relatively low distortion error, and propose Rigel, a hyperbolic graph coordinate system that lends itself to efficient parallelization across a compute cluster. Rigel produces significantly more accurate results than prior systems, and is naturally parallelizable across compute clusters, allowing it to provide accurate results for graphs up to 43 million nodes. Finally we show that Rigel's functionality can be easily extended to locate (near-) shortest paths between node pairs. After a one-time preprocessing cost, Rigel answers node-distance queries in 10's of microseconds, and also produces shortest path results up to 18 times faster than prior shortest-path systems with similar levels of accuracy.
|
|
|
Input parameters:
-l: The input distance file, which is a shortest distance matrix from the landmarks to all the other nodes. Format: each line stands for a landmark and all the numbers are the distances from the this landmark to all nodes in this graph which are separated by space.
-b: The number of landmarks which is used by each node to compute coordinates.
-x: The number of space dimension
-m: The parameter file name.
Format of parameter file:
line 1: #OfLandmark #OfHighestDegreeLandmark #OfTotalNode
line 2: #OfFirstJoinLandmark #OfClosestLandmark #OfNeighbors #OfHighestLandmark
Example:
100 100 246692
16 16 0 16
-t: The prefix of files. The appendix of these files are ".num" and ".ord".
file.num is a file to record the number of nodes in each tree(file.ord). Format: TreeID #OfNumber
file.ord is a file of input node ID. Format: InputNodeFakeID -1.
-r: The prefix of files about landmarks. The appendix of these files are ".mark", ".bas" and ".sec". file.mark is a file of landmark information. Format: InputOrder realID fakeID.
file.bas is a file of landmark order number which first join the system (please refer to our paper for more details). Format: InputOrder.
file.sec is FakeID of secondary landmark as secondary landmark input order to our system. Format: FakeID
-w: The computation method of landmark. Here it should be 1
-e: The curvature of Hyperbolic space (a negative value)
-v: The file which save the coordinates of landmarks.
-y: The number of file/tree which starts to compute. If this number is -1, it will only compute the coordinates for landmarks.
-n: The number of file/tree which ends to compute.
-o: The prefix of output files.
For each tree/file, there are two files, ".time" and ".coord".
file+treeID.time will record the computation time.
file+treeID.coord will record the coordinates of the nodes in this tree.
For landmark computation, there are two files, ".time" and ".land".
file.time is a file to record computation time.
file.land is a file to record the coordinates of all landmarks.
Format of ".coord" and ".land" files:
each line: fakeID DIM1 DIM2...
Note: Since the node ID in real graphs may not be consecutive, we call this id is realID; in our code, all the node IDs should start from 0 and be consecutive which we call this id is fakeID.
The code is written in C++ and is compiled by g++ in Linux. Makefile is included in this version. Thus, "make" command is used to compile this code. Click here to download Rigel.
Download the example of test data and command. Click here.(Note: this data is randomly generated, not from real graphs. The usage of them is to know how to run this code. You could use the commands in command.txt to run this code.)