Tuesday, February 1, 2011

Mahout on CMU OpenCloud

This post explains how to run Mahout on top of CMU OpenCloud.

1) log into the cloud login node
ssh -L 8888:proxy.opencloud:8888 login.cloud.pdl.cmu.local.

2) copy mahout directory tree into your home folder. 

3) Run Mahout example
cd mahout-0.4/
export JAVA_HOME=/usr/lib/jvm/java-6-sun/
 ./examples/bin/build-reuters.sh

You should see:
sh -x ./examples/bin/build-reuters.sh
11/02/01 15:13:27 INFO driver.MahoutDriver: Program took 225915 ms
+ ./bin/mahout seqdirectory -i ./examples/bin/work/reuters-out/ -o ./examples/bin/work/reuters-out-seqdir -c UTF-8 -chunk 5
Running on hadoop, using HADOOP_HOME=/usr/local/sw/hadoop
HADOOP_CONF_DIR=/etc/hadoop/conf/global
11/02/01 15:13:38 INFO driver.MahoutDriver: Program took 10087 ms
+ ./bin/mahout seq2sparse -i ./examples/bin/work/reuters-out-seqdir/ -o ./examples/bin/work/reuters-out-seqdir-sparse
Running on hadoop, using HADOOP_HOME=/usr/local/sw/hadoop
HADOOP_CONF_DIR=/etc/hadoop/conf/global
11/02/01 15:13:40 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
11/02/01 15:13:40 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
11/02/01 15:13:40 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
11/02/01 15:13:41 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/02/01 15:13:42 INFO input.FileInputFormat: Total input paths to process : 3
11/02/01 15:13:47 INFO mapred.JobClient: Running job: job_201101170028_1733
11/02/01 15:13:48 INFO mapred.JobClient:  map 0% reduce 0%
11/02/01 15:17:49 INFO mapred.JobClient:  map 33% reduce 0%
11/02/01 15:17:55 INFO mapred.JobClient:  map 66% reduce 0%
11/02/01 15:18:01 INFO mapred.JobClient:  map 100% reduce 0%
11/02/01 15:18:08 INFO mapred.JobClient: Job complete: job_201101170028_1733
11/02/01 15:18:08 INFO mapred.JobClient: Counters: 6
11/02/01 15:18:08 INFO mapred.JobClient:   Job Counters
11/02/01 15:18:08 INFO mapred.JobClient:     Rack-local map tasks=5
11/02/01 15:18:08 INFO mapred.JobClient:     Launched map tasks=5
11/02/01 15:18:08 INFO mapred.JobClient:   FileSystemCounters
11/02/01 15:18:08 INFO mapred.JobClient:     HDFS_BYTES_READ=13537042
11/02/01 15:18:08 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=11047110
11/02/01 15:18:08 INFO mapred.JobClient:   Map-Reduce Framework
11/02/01 15:18:08 INFO mapred.JobClient:     Map input records=16115
11/02/01 15:18:08 INFO mapred.JobClient:     Spilled Records=0
11/02/01 15:18:08 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/02/01 15:18:09 INFO input.FileInputFormat: Total input paths to process : 3
11/02/01 15:18:15 INFO mapred.JobClient: Running job: job_201101170028_1736
11/02/01 15:18:16 INFO mapred.JobClient:  map 0% reduce 0%
...

No comments:

Post a Comment