Wednesday, October 24, 2012

Deploying a GraphLab Cluster using MPI

Note: the MPI section of this toturial is based on this excellent tutorial.

Preliminaries:


  •  Mpi should be installed


Step 0: Install GraphLab on one of your cluster nodes. 


Using the instructions here on your master node (one of your cluster machines)


Step 1: start MPI

a) Create a file called .mpd.conf in your home directory (only once)
This file should contain a secret password using the format:
secretword=some_password_as_you_wish

b) Verify that MPI is working by running the daemon (only once)
$ mpd &       # run the MPI daemon
$ mpdtrace   # lists all active MPI nodes
$ mpdallexit # kill MPI

c) Spawn the cluster. Create a file named machines with the list of machines you like to deploy.
Run:
$ mpd -f machines -n XX  # where XX is the number of MPI nodes you like to spawn.

d) Copy GraphLab files to all machines.
On the node you installed GraphLab on, run the following commands to copy GraphLab files to the rest of the machines:

d1) Verify you have the machines files from section 1c) in your root folder.

d2) Copy the GraphLab files
 cd ~/graphlabapi/release/toolkits;  ~/graphlabapi/scripts/mpirsync 
  cd ~/graphlabapi/deps/local; ~/graphlabapi/scripts/mpirsync 

Step 2: Run GraphLab ALS

This step runs ALS (alternating least squares) in a cluster using small netflix susbset.
It first downloads the data from the web: http://www.select.cs.cmu.edu/code/graphlab/datasets/smallnetflix_mm.train and http://www.select.cs.cmu.edu/code/graphlab/datasets/smallnetflix_mm.validate, and runs 5 alternating least squares iterations. After the run is completed, you can login into any of the nodes and view the output files in the folder ~/graphlabapi/release/toolkits/collaborative_filtering/ 
The algorithm operation is explained in detail here

cd /some/ns/folder/
mkdir smallnetflix 
cd smallnetflix/ 
wget http://www.select.cs.cmu.edu/code/graphlab/datasets/smallnetflix_mm.train 
wget http://www.select.cs.cmu.edu/code/graphlab/datasets/smallnetflix_mm.validate 

Now run GraphLab:

mpiexec -n 2 /path/to/als --graph /some/ns/folder/ --max_iter=3 --ncpus=1

Where -n is the number of MPI nodes, and --ncpus is the number of deployed cores on each MPI node.

Note: this section assumes you have a network storage (ns) folder where the input can be stored.
Alternatively, you can split the input into several disjoint files, and store the subsets on the cluster machines.

Note: Don't forget to change /path/to/als and /some/ns/folder to your actual folder path!


No comments:

Post a Comment